fix: short-circuit list_schemas to skip ~500x storm in before_run#9671
Merged
Conversation
PR SummaryMedium Risk Overview Makes Trino schema creation idempotent by using Reviewed by Cursor Bugbot for commit 98da16e. Configure here. |
98da16e to
b6185a1
Compare
dbt-core's RunTask.create_schemas builds required_databases via Set[BaseRelation] from .include(database=True, schema=False, identifier=False). BaseRelation.__hash__ is hash(render()) but __eq__ compares to_dict(); the underlying schema/identifier fields aren't cleared so all entries hash same but compare unequal, and the set keeps every one. dbt then dispatches one adapter.list_schemas(database) future per (db, schema) pair touched by the run -- ~500 identical 'select schema_name from <db>.information_schema.schemata' queries during before_run on spellbook hourly (~5 min wall time). Always return [] here. dbt-core falls through to dispatching create_schema for each unique (db, schema) string tuple (that dedup uses string sets, not BaseRelation, and works correctly). Update trino__create_schema to use CREATE SCHEMA IF NOT EXISTS for the hive branch so the dispatch is a metastore-cheap no-op (single getDatabase call) for existing schemas instead of a full information_schema.schemata scan. Upstream fix in dbt-adapters: dbt-labs/dbt-adapters#1930 Drop this workaround once #1930 ships in our pinned dbt-adapters version.
b6185a1 to
2b4187b
Compare
jeff-dude
approved these changes
May 14, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Every
dbt runagainstprodissues ~500 identicalselect schema_name from hive.INFORMATION_SCHEMA.schemataqueries duringbefore_run(~5 min wall time on hourly). Root cause is aSet[BaseRelation]dedup bug indbt-adapters—__hash__is render-based but__eq__isto_dict-based, and.include(schema=False, identifier=False)doesn't clear the underlying path fields, so the set keeps every entry instead of one per unique database.Upstream fix: dbt-labs/dbt-adapters#1930. This PR is a project-side workaround until that lands and ships.
Change
dbt_macros/dune/no-relation-listing.sql:list_schemasnow always returns[]. dbt-core falls through to dispatchingcreate_schemafor each unique(db, schema)string tuple (that dedup is native string sets, notBaseRelation, and works correctly).dbt_macros/dune/schema.sql: addIF NOT EXISTSto the hive branch oftrino__create_schemaso the fallback dispatch is a metastore-cheap no-op (singlegetDatabaseThrift call) on already-existing schemas instead of a fullinformation_schema.schematascan.Expected effect
Before:
After:
Pulled from production
debug.log(dbt_cloud_run_id=70471878875888).Risk
list_schemasreturning[]would lie to any other caller that depends on knowing which schemas exist. Spellbook has no such caller (grep againstsources/,dbt_subprojects/,dbt_macros/is empty). The only Python caller in dbt-core isRunTask.create_schemas, which is precisely the path we want to short-circuit.CREATE SCHEMA IF NOT EXISTS ... WITH (location=...)is valid Trino syntax (theWITHclause is ignored for existing schemas).Once
dbt-labs/dbt-adapters#1930ships in our pinned dbt-adapters version, this macro should be reverted to the dispatched form.